Continuous sampling from distributed streams

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Continuous Sampling from Distributed Streams

A fundamental problem in data management is to draw and maintain a sample of a large data set, for approximate query answering, selectivity estimation, and query planning. With large, streaming data sets, this problem becomes particularly difficult when the data is shared across multiple distributed sites. The main challenge is to ensure that a sample is drawn uniformly across the union of the ...

متن کامل

Optimal Random Sampling from Distributed Streams Revisited

We give an improved algorithm for drawing a random sample from a large data stream when the input elements are distributed across multiple sites which communicate via a central coordinator. At any point in time the set of elements held by the coordinator represent a uniform random sample from the set of all the elements observed so far. When compared with prior work, our algorithms asymptotical...

متن کامل

Continuous Distributed Counting for Non-monotonous Streams

We consider the continual count tracking problem in a distributed environment where the input is anaggregate stream originating from k distinct sites and the updates are allowed to be non-monotonous, i.e. both incre-ments and decrements are allowed. The goal is to continually track the count within a prescribed relative accuracyat the lowest possible communication cost. Specifically...

متن کامل

Weighted Sampling Without Replacement from Data Streams

Weighted sampling without replacement has proved to be a very important tool in designing new algorithms. Efraimidis and Spirakis (IPL 2006) presented an algorithm for weighted sampling without replacement from data streams. Their algorithm works under the assumption of precise computations over the interval [0, 1]. Cohen and Kaplan (VLDB 2008) used similar methods for their bottom-k sketches. ...

متن کامل

On Sampling from Massive Graph Streams

We propose Graph Priority Sampling (GPS), a new paradigm for order-based reservoir sampling from massive graph streams. GPS provides a general way to weight edge sampling according to auxiliary and/or size variables so as to accomplish various estimation goals of graph properties. In the context of subgraph counting, we show how edge sampling weights can be chosen so as to minimize the estimati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of the ACM

سال: 2012

ISSN: 0004-5411,1557-735X

DOI: 10.1145/2160158.2160163